Learning Better Name Translation for Cross-Lingual Wikification
نویسندگان
چکیده
A notable challenge in cross-lingual wikification is the problem of retrieving English Wikipedia title candidates given a non-English mention, a step that requires translating names written in a foreign language into English. Creating training data for name translation requires significant amount of human efforts. In order to cover as many languages as possible, we propose a probabilistic model that leverages indirect supervision signals in a knowledge base. More specifically, the model learns name translation from title pairs obtained from the inter-language links in Wikipedia. The model jointly considers word alignment and word transliteration. Comparing to 6 other approaches on 9 languages, we show that the proposed model outperforms others not only on the transliteration metric, but also on the ability to generate target English titles for a cross-lingual wikifier. Consequently, as we show, it improves the end-to-end performance of a cross-lingual wikifier on the TAC 2016 EDL dataset.
منابع مشابه
Cross-lingual Wikification Using Multilingual Embeddings
Cross-lingual Wikification is the task of grounding mentions written in non-English documents to entries in the English Wikipedia. This task involves the problem of comparing textual clues across languages, which requires developing a notion of similarity between text snippets across languages. In this paper, we address this problem by jointly training multilingual embeddings for words and Wiki...
متن کاملTowards Cross-lingual Patent Wikification
This paper demonstrates the effectiveness of cross-lingual patent wikification, which links technical terms in a patent application document to their corresponding Wikipedia articles in different languages. The number of links increases definitely because different language versions of Wikipedia cover different sets of technical terms. We present an experiment of Japanese-to-English cross-lingu...
متن کاملA Single-step Machine Learning Approach to Link Detection in Wikipedia: NTCIR Crosslink-2 Experiments at KSLP
This study describes a link detection method to find relevant cross-lingual links from Korean Wikipedia documents to English ones at term level. Earlier wikification approaches have used two independent steps for link disambiguation and link determination. This study seeks to merge these two separate steps into a singlestep machine learning scheme. Our method at NTCIR-10 Koreanto-English CLLD t...
متن کاملEnglish-Persian Plagiarism Detection based on a Semantic Approach
Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-li...
متن کاملCross-Lingual Named Entity Recognition via Wikification
Named Entity Recognition (NER) models for language L are typically trained using annotated data in that language. We study cross-lingual NER, where a model for NER in L is trained on another, source, language (or multiple source languages). We introduce a language independent method for NER, building on cross-lingual wikification, a technique that grounds words and phrases in nonEnglish text in...
متن کامل